home *** CD-ROM | disk | FTP | other *** search
-
- Additions from d.g. gilbert (gilbertd@bio.indiana.edu). These modifications
- can be picked up via ftp or gopher to ftp.bio.indiana.edu.
- Look in folder IUBio Software+Data/util/wais for iubio-wais-8b5.tar.Z
- (full source) or iubio-wais-8b5.patch for a difference or
- patch file.
-
-
- These additions include
- BOOLEANS == boolean 'and' and 'not' operators. This is at a simple level.
- There are no nesting symbols. Words are evaluated from left to right in
- the wais query string. When a 'not' operator is found, the following
- single word is moved into a buffer of not-words. This not-word buffer
- is evaluated after all the other words are evaluated. If a document matches
- a not-word, that document is removed from the set of matches (given a
- score of zero).
- When an 'and' word is found, the word following it is checked for matches
- to documents, and the the set of documents matching this and-word is
- compared to the set of documents matching any words prior to the and-word.
- The intersection of prior and current matching documents is retained, others
- are removed (set to a zero score).
- Only the waisserver is affected by BOOLEANS
-
- For example, this query
- red and blue or yellow but not green and orange or black but not white
-
- Will be interpreted like this (the parentheses just show the implicit left-to-right
- interpretation):
- (((((red and blue) or yellow) and orange) or black) not green) not white)
-
- PARTIALWORD == The asterisk symbol '*' is parsed if it immediately follows a word,
- as an key to search for all partial words that match the first part of the
- word.
- Only the waisserver is affected by PARTIALWORD.
-
- LITERAL == The quote or double-quote symbols are interpreted, if they occur
- in pairs around a string, as requesting a literal match of the enclosed
- string. Any special symbols, 'and', 'not', and '*' inside a literal string
- are not interpreted. The first part of a literal string must be a
- word that is indexed for that data, rather than a delimiter symbol, for
- a match to be found.
- Only the waisserver is affected by LITERAL.
-
- BIO == these are a set of changes that include
- - optional 'stoplist' file of words to ignore,
- - an optional set of symbols that are used as delimiters, with other
- graphic characters being used as valid word symbols,
- - a selection of biology data document structures
- BIO affects waisindex as well as waisserver.
-
- Internet GOpher users: You will need my patch to the GopherD Waisindex.c
- to use these BIO patches from Internet Gopher. See ftp.bio.indiana.edu,
- /util/gopher/iubio-gopher-v1.patch for these changes.
-
- Some general restructuring of the wais-8-b5 code was done also, to make
- parameter passing easier for user-defined data files. This should not cause
- any functional difference from the 8-b5 code when the Makefile defines are
- turned off.
-
- I consider the BOOLEANS modification to be the simplest, and smallest change
- to the wais code. The LITERAL and PARTIALWORD are somewhat more complex, but
- still fairly restricted in scope, affecting only a few waisserver modules. The
- BIO modifications involve changes in many areas of the wais code.
-
- At least a couple of problems remain:
- The wais-8b5 code will not properly index words that occur more
- than 32,000 times. The Genbank biology databank that I serve
- here has > 80,000 records, and some important index words occur
- more than this limit. The wais release 8b3 was amenable to
- uping this limit, but not the w8b5. I'll continue to look
- for this problem.
-
- The partial word matching addition of mine is imperfect -- it
- misses some words that should match. Anyone who can look into
- this and help out, I'd appreciate it. I'll also work on this
- as time permits.
-
-
- See Makefile to enable/disable these additions. You should be able to have
- functionally equivalent programs to wais-8-b5 by turning off these defines,
- or enable them individually as needed.
-
- ........
- # dgg additions
- # LITERAL == waisserver, search for "literal strings"
- # BOOLEANS == waisserver, search with boolean AND, NOT operators
- # PARTIALWORD == waisserver, search for partial words, hum* matches human, hummingbird, ...
- # BIO == waisindex, waisserver changes including symbol indexing & search & bio data formats
- #
- #CFLAGS = -g -I$(SUPDIR) -DSECURE_SERVER -DRELEVANCE_FEEDBACK -DUSG
- CFLAGS = -g -I$(SUPDIR) -DSECURE_SERVER -DRELEVANCE_FEEDBACK -DUSG -DBIO -DBOOLEANS -DPARTIALWORD -DLITERAL
-